gh-138232: Improve performance of dataclasses.asdict by caching dataclass field names by eendebakpt · Pull Request #138233 · python/cpython

eendebakpt · 2025-08-28T21:04:47Z

Benchmark results with PGO on Linux:

asdict: Mean +- std dev: [main2] 2.13 us +- 0.10 us -> [pr2] 1.61 us +- 0.11 us: 1.32x faster
astuple: Mean +- std dev: [main2] 2.61 us +- 0.14 us -> [pr2] 2.15 us +- 0.12 us: 1.21x faster
f.__getstate__(): Mean +- std dev: [main2] 840 ns +- 41 ns -> [pr2] 339 ns +- 20 ns: 2.48x faster

Benchmark hidden because not significant (1): instance creation

Geometric mean: 1.41x faster

(script in issue)

Issue: Improve performance of dataclasses.asdict by caching field names #138232

picnixz

A few nitpicks. Memory-wise, we're already holding all the fields in _FIELDS, so holding their names shouldn't be an issue. Don't forget the NEWS + What's New entry if the change is accepted.

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>

eendebakpt · 2025-08-28T21:46:56Z

    # order, so the order of the tuple is as the fields were defined.
    return tuple(f for f in fields.values() if f._field_type is _FIELD)

+def _field_names(class_or_instance):


Inlining this method gives a bit of performance gain:

%timeit _field_names(s) %timeit s.__dataclass_field_names__

Results in

41.8 ns ± 1.59 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each) 17.8 ns ± 0.188 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)

picnixz

More generally, I wonder whether many small functions would actually benefit from a C implementation as they seem to target micro-optimizations. I don't know which ones we could optimize and I also don't know if it's worth it but had we ever considered doing this?

Now, I think it's a good trade-off to increase memory per dataclass type but reduce quite drastically the time it takes to do asdict transformations.

picnixz · 2025-09-16T14:33:20Z

+
+    Accepts a dataclass or an instance of one. Excludes pseudo-fields.
+    """
+


My suggestion here is to remove the blank line after """. And I however think that it would be better to actually inline the function but considering we also use this approach for _PARAMS, it would be more consistent to keep it.

eendebakpt · 2025-09-23T10:15:00Z

I addressed the review comments. In the last commit I inlined the _field_names method. That can be reverted though if needed.

picnixz · 2025-09-23T10:31:38Z

    # also marks this class as being a dataclass.
    setattr(cls, _FIELDS, fields)
+    # Store field names. Excludes pseudo-fields.
+    setattr(cls, _FIELD_NAMES, tuple(f.name for f in fields.values()


If you want, you can also do cls.__dataclass_field_names__ = ... directly (I would also advise you to consider this as part of the "inline" commit, so you can also force-push this one).

An other alternative is to only inline getattr(x, _FIELDS_NAMES) instead of having a standalone function for that. It could also improve readability a bit and justify the needs of the global _FIELD_NAMES.

IOW, choose among the following:

Hardcode __dataclass_field_names__ everywhere, without having using getattr/setattr and _FIELD_NAMES.

Use _FIELD_NAMES + getattr/setattr(..., _FIELD_NAMES) directly.

Use _FIELD_NAMES + a module-wide function defined to be the partialization _get_names(x) ~ getattr(x, _FIELD_NAMES).

My preference is (1) or (2) but (3) is an overkill IMO.

Ok, going for option (1)

albertedwardson · 2025-10-28T15:37:43Z

    # also marks this class as being a dataclass.
    setattr(cls, _FIELDS, fields)
+    # Store field names. Excludes pseudo-fields.
+    cls.__dataclass_field_names__ = tuple(f.name for f in fields.values()


hi! no one asked me for review, but I'd like to put in my two cents :)

I believe this could be faster by moving from tuple(genexp) to tuple([listcomp])

this is definitely not the place where the most computational time spent for creating a dataclass, but anyways :)

Improve performance of dataclasses by caching dataclass field names

07a4ee3

eendebakpt requested a review from ericvsmith as a code owner August 28, 2025 21:04

bedevere-app Bot added the awaiting review label Aug 28, 2025

bedevere-app Bot mentioned this pull request Aug 28, 2025

Improve performance of dataclasses.asdict by caching field names #138232

Open

picnixz reviewed Aug 28, 2025

View reviewed changes

Comment thread Lib/dataclasses.py Outdated

Comment thread Lib/dataclasses.py Outdated

Comment thread Lib/dataclasses.py Outdated

Apply suggestions from code review

ab5d43b

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>

eendebakpt commented Aug 28, 2025

View reviewed changes

Comment thread Lib/dataclasses.py Outdated

Update Lib/dataclasses.py

20c603a

eendebakpt commented Aug 28, 2025

View reviewed changes

blurb-it Bot and others added 2 commits August 29, 2025 17:58

📜🤖 Added by blurb_it.

ea105e2

Merge branch 'main' into dataclass_field_names

f65d9a4

eendebakpt requested a review from picnixz September 10, 2025 18:47

picnixz approved these changes Sep 16, 2025

View reviewed changes

bedevere-app Bot added awaiting merge and removed awaiting review labels Sep 16, 2025

ericvsmith reviewed Sep 18, 2025

View reviewed changes

Comment thread Misc/NEWS.d/next/Library/2025-08-29-17-58-36.gh-issue-138232.-W4iaS.rst Outdated

eendebakpt added 3 commits September 23, 2025 12:09

address review comments

573b186

inline _field_names

5c8892b

Merge branch 'main' into dataclass_field_names

eef9041

picnixz reviewed Sep 23, 2025

View reviewed changes

eendebakpt added 2 commits September 23, 2025 23:08

inline part 2

b1419fd

Merge branch 'main' into dataclass_field_names

720ef42

eendebakpt changed the title ~~gh-138232: Improve performance of dataclasses by caching dataclass field names~~ gh-138232: Improve performance of dataclasses.asdict by caching dataclass field names Oct 18, 2025

albertedwardson reviewed Oct 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gh-138232: Improve performance of dataclasses.asdict by caching dataclass field names#138233

gh-138232: Improve performance of dataclasses.asdict by caching dataclass field names#138233
eendebakpt wants to merge 10 commits intopython:mainfrom
eendebakpt:dataclass_field_names

eendebakpt commented Aug 28, 2025 •

edited

Loading

Uh oh!

picnixz left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eendebakpt Aug 28, 2025

Uh oh!

picnixz left a comment

Uh oh!

picnixz Sep 16, 2025

Uh oh!

Uh oh!

Uh oh!

eendebakpt commented Sep 23, 2025

Uh oh!

picnixz Sep 23, 2025

Uh oh!

eendebakpt Sep 23, 2025

Uh oh!

albertedwardson Oct 28, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants


		Accepts a dataclass or an instance of one. Excludes pseudo-fields.
		"""

Uh oh!

Conversation

eendebakpt commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

picnixz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eendebakpt Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

picnixz left a comment

Choose a reason for hiding this comment

Uh oh!

picnixz Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

eendebakpt commented Sep 23, 2025

Uh oh!

picnixz Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

eendebakpt Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

albertedwardson Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

eendebakpt commented Aug 28, 2025 •

edited

Loading

picnixz left a comment •

edited

Loading

albertedwardson Oct 28, 2025 •

edited

Loading